Search CORE

32 research outputs found

A unifying framework for seed sensitivity and its application to subset seeds

Author: A. Finkelstein
A.V. Aho
B. Brejova
B. Brejova
B. Brejova
B. Ma
D. Brown
G. Kucherov
G. Kucherov
I.H. Yang
J. Buhler
J. Xu
J.D. Ullman
K. Choi
K.P. Choi
S. Altschul
S. Burkhardt
W. Chen
W.J. Kent
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2004
Field of study

We propose a general approach to compute the seed sensitivity, that can be applied to different definitions of seeds. It treats separately three components of the seed sensitivity problem -- a set of target alignments, an associated probability distribution, and a seed model -- that are specified by distinct finite automata. The approach is then applied to a new concept of subset seeds for which we propose an efficient automaton construction. Experimental results confirm that sensitive subset seeds can be efficiently designed using our approach, and can then be used in similarity search producing better results than ordinary spaced seeds

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

PubMed Central

Decoding HMMs using the k best paths: algorithms and applications

Author: A Bairoch
A Krogh
B Brejova
D Eppstein
D Golod
D Golod
Daniel G Brown
Daniil Golod
G Tusnady
L Kall
L Kall
L Rabiner
M Rapp
P Fariselli
R Durbin
R Sramek
Publication venue: BioMed Central
Publication date
Field of study

Crossref

PubMed Central

SEARCHPATTOOL: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic DNA sequences

Author: A Brazma
A Califano
B Brejova
DR Cavener
E Eskin
Fathi Elloumi
FP Roth
G Pavesi
G Thijs
GZ Hertz
H Salgado
I Jonassen
I Rigoutsos
I Rigoutsos
J Van Helden
M Burset
M Tompa
Martha Nason
PA Pevzner
PA Pevzner
R Agrawal
S Sinha
S Sinha
TL Bailey
Y Makita
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Computational methods to predict transcription factor binding sites (TFBS) based on exhaustive algorithms are guaranteed to find the best patterns but are often limited to short ones or impose some constraints on the pattern type. Many patterns for binding sites in prokaryotic species are not well characterized but are known to be large, between 16–30 base pairs (bp) and contain at least 2 conserved bases. The length of prokaryotic species promoters (about 400 bp) and our interest in studying a small set of genes that could be a cluster of co-regulated genes from microarray experiments led to the development of a new exhaustive algorithm targeting these large patterns. Results We present Searchpattool, a new method to search for and select the most specific (conservative) frequent patterns. This method does not impose restrictions on the density or the structure of the pattern. The best patterns (motifs) are selected using several statistics, including a new application of a z-score based on the number of matching sequences. We compared Searchpattool against other well known algorithms on a <it>Bacillus subtilis </it>group of 14 input sequences and found that in our experiments Searchpattool always performed the best based on performance scores. Conclusion Searchpattool is a new method for pattern discovery relative to transcription factor binding sites for species or genes with short promoters. It outputs the most specific significant patterns and helps the biologist to choose the best candidates.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Hit integration for identifying optimal spaced seeds

Author: B Brejova
B Ma
B Ma
DYF Mak
FP Preparata
G Kucherov
H Anton
I Herms
IH Yang
J Buhler
J Stoer
J Xu
J Yang
KP Choi
KP Choi
L Ilie
L Ilie
L Zhou
M Farach-Colton
M Li
M Li
Seong-Bae Park
SF Altschul
TF Smith
U Keich
WJ Kent
Won-Hyoung Chung
WR Pearson
X Gao
Y Sun
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

A draft genome sequence of the elusive giant squid, Architeuthis dux

Author: Albertin C. B.
Alexander G. C.
Antunes A.
Baril T.
Barrio-Hernandez I.
Blagoev B.
Brejova B.
Campos A.
Castro L. F. C.
Chu C.
Couto A.
Da Fonseca R. R.
Fedrigo O.
Frazao B.
Gardner P.
Gilbert M. T. P.
Hayward A.
Hoving H. -J.
Jarvis E.
Li Q.
Ma B.
Machado A. M.
Musacchia F.
Nielsen R.
Osorio H.
Patricio M.
Penaloza F.
Petersen B.
Pisani D.
Rahman M. Z.
Rasmussen S.
Ribeiro A. M.
Rocha S.
Sanges R.
Sicheritz-Ponten T.
Silva F.
Simakov O.
Strugnell J. M.
Tafur-Jimenez R.
Vinar T.
Vinther J.
Winkelmann I.
Wu Y.
Zhang G.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 21/11/2019
Field of study

Background: The giant squid (Architeuthis dux; Steenstrup, 1857) is an enigmatic giant mollusc with a circumglobal distribution in the deep ocean, except in the high Arctic and Antarctic waters. The elusiveness of the species makes it difficult to study. Thus, having a genome assembled for this deep-sea-dwelling species will allow several pending evolutionary questions to be unlocked. Findings: We present a draft genome assembly that includes 200 Gb of Illumina reads, 4 Gb of Moleculo synthetic long reads, and 108 Gb of Chicago libraries, with a final size matching the estimated genome size of 2.7 Gb, and a scaffold N50 of 4.8 Mb. We also present an alternative assembly including 27 Gb raw reads generated using the Pacific Biosciences platform. In addition, we sequenced the proteome of the same individual and RNA from 3 different tissue types from 3 other species of squid (Onychoteuthis banksii, Dosidicus gigas, and Sthenoteuthis oualaniensis) to assist genome annotation. We annotated 33,406 protein-coding genes supported by evidence, and the genome completeness estimated by BUSCO reached 92%. Repetitive regions cover 49.17% of the genome. Conclusions: This annotated draft genome of A. dux provides a critical resource to investigate the unique traits of this species, including its gigantism and key adaptations to deep-sea environments

OceanRep

Investigo

Woods Hole Open Access Server

ResearchOnline at James Cook University

Copenhagen University Research Information System

eScholarship - University of California

Sissa Digital Library

Open Research Exeter

Repositório Aberto da Universidade do Porto

NORA - Norwegian Open Research Archives

Explore Bristol Research

The common marmoset genome provides insight into primate biology and evolution

We report the whole-genome sequence of the common marmoset (Callithrix jacchus). The 2.26-Gb genome of a female marmoset was assembled using Sanger read data (6×) and a whole-genome shotgun strategy. A first analysis has permitted comparison with the genomes of apes and Old World monkeys and the identification of specific features that might contribute to the unique biology of this diminutive primate, including genetic changes that may influence body size, frequent twinning and chimerism. We observed positive selection in growth hormone/insulin-like growth factor genes (growth pathways), respiratory complex I genes (metabolic pathways), and genes encoding immunobiological factors and proteases (reproductive and immunity pathways). In addition, both protein-coding and microRNA genes related to reproduction exhibited evidence of rapid sequence evolution. This genome sequence for a New World monkey enables increased power for comparative analyses among available primate genomes and facilitates biomedical research application. © 2014 Nature America, Inc

Louisiana State University

Improving model construction of profile HMMs for remote homology detection through structural alignment

Author: A Andreeva
A Bateman
A Krogh
A Krogh
AC Camproux
Alberto MR Dávila
B Brejova
B Knudsen
B Qian
C Bystroff
C Do
C Notredame
D Feng
D Haft
F Altschul
F Goyon
Gerson Zaverucha
H Mamitsuka
I Letunic
J Espadaler
J Gough
J Park
J Shi
J Söding
J Thompson
JD Thompson
JR Beck
Juliana S Bernardes
K Bae
K Karplus
K Karplus
K Katoh
K Lin
K Mizuguchi
K Sjolander
L Holm
L Rabiner
M Gribskov
M Helen
M Madera
M Mendel
M Wistrand
M Wistrand
O Sullivan
P Bourne
P Nuin
R Edgar
R Hughey
R Hughey
R Karchin
S Altschul
S Eddy
S Jones
T Attwood
T Mitchell
V Alexandrov
Vítor S Costa
W Majoros
W Taylor
WR Pearson
Y Hou
Y Hou
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the <it>Twilight Zone</it>, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. Results We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. Conclusion We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The completion of the Mammalian Gene Collection (MGC)

Author: Astashyn A.
Baertsch R.
Bhat N.
Blakesley R. W.
Bonner T. I.
Bouffard G. G.
Brejova B.
Brent M.
Brown G.
Brownstein M.
Buetow K. H.
Chuah E.
Collins F. S.
Comstock C. L.
Deng A.
Deng M.
Derge J. G.
Dickson M. C.
Diekhans M.
Farrell C.
Feingold E. A.
Garcia A. M.
Gerhard D. S.
Ghamsari L.
Gibbs R. A.
Good P. J.
Green E. D.
Grimwood J.
Gruber C. E.
Gunaratne P. H.
Hart J.
Harte R.
Haussler D.
Hirst M.
Hudson J.
Jacob H.
Jang W.
Kent J.
Kloske D.
Landrum M.
Langton L.
Lazar J.
Lebeau A.
Lewis J.
Lin C.
Ma K.
Maglott D.
Mah D.
Maidak B. L.
Mandich A.
Marsh A.
McPherson J.
Mello E.
Misquitta L.
Moksa M.
Moore T.
Mullikin J.
Muratet M.
Murphy M.
Murphy T.
Murray R. R.
Muzny D.
Myers R. M.
Pang J.
Pardes E.
Pennacchio C.
Phan L.
Pruitt K. D.
Rajput B.
Rasooly R.
Riddick L.
Robinson C.
Rodriguez A. C.
Salehi-Ashtiani K.
Schaefer C. F.
Schmutz J.
Schreiber K.
Sethupathy P.
Shapiro N.
Shenmen C. M.
Shoaf D.
Sieja S.
Siepel A.
Simmons B.
Smith M. R.
Stevens M.
Taylor G.
Temple G.
Tse K.
van Baren M. J.
Wagner L.
Ward M.
Webb D.
Weber J.
Wei C.
Wu J.
Wu W.
Yankie L.
Young A. C.
Zeng T.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/12/2009
Field of study

Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide

Cold Spring Harbor Laboratory Institutional Repository

Reconstructing histories of complex gene clusters on a phylogeny

Author: Brejova B.
Siepel A.
Song G.
Vinar T.
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2009
Field of study

Clusters of genes that have evolved by repeated segmental duplication present difficult challenges throughout genomic analysis, from sequence assembly to functional analysis. These clusters are one of the major sources of evolutionary innovation, and they are linked to multiple diseases, including HIV and a variety of cancers. Understanding their evolutionary histories is a key to the application of comparative genomics methods in these regions of the genome. We propose a probabilistic model of gene cluster evolution on a phylogeny, and an MCMC algorithm for reconstruction of duplication histories from genomic sequences in multiple species. Several projects are underway to obtain high quality BAC-based assemblies of duplicated clusters in multiple species, and we anticipate use of our methods in their analysis

arXiv.org e-Print Archive

Cold Spring Harbor Laboratory Institutional Repository

New Bounds for Motif Finding in Strong Instances

Author: A. Panconesi
B. Brejova
C. McDiarmid
G.Z. Hertz
M. Li
W.J. Hoeffding
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Crossref